GISMO—gene identification using a support vector machine for ORF classification
نویسندگان
چکیده
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license.
منابع مشابه
Published and Not Perished
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملAPPLICATION OF THE HYBRID HARMONY SEARCH WITH SUPPORT VECTOR MACHINE FOR IDENTIFICATION AND CALSSIFICATION OF DAMAGED ZONE AROUND UNDERGROUND SPACES
An excavation damage zone (EDZ) can be defined as a rock zone where the rock properties and conditions have been changed due to the processes related to an excavation. This zone affects the behavior of rock mass surrounding the construction that reduces the stability and safety factor and increase probability of failure of the structure. This paper presents an approach to build a model for the ...
متن کاملFault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملIdentification of Alzheimer disease-relevant genes using a novel hybrid method
Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 35 شماره
صفحات -
تاریخ انتشار 2007